Kasper Hornbæk
Return to Usability Biography Kasper Hornbæk received his M.Sc. and Ph.D. in Computer Science from the University of Copenhagen, in 1998 and 2002, respectively. Since 2009 he has been a professor with special duties in Human-centered Computing at University of Copenhagen. His core research interests are human-computer interaction, usability research, search user interfaces, and information visualization; detours include eye tracking, cultural usability, and reality-based interfaces. Kasper serves on the editorial boards of Journal of Usability Studies, Interacting with Computers, and International Journal of Human-Computer Studies (IJHCS). He has published at CHI, UIST, ACM Transactions on Computer-Human Interaction, and Human-Computer Interaction, and won IJHCS’s most cited paper award 2006-2008. ---- Article: Current practice in measuring usability: Challenges to usability studies and research Journal: International Journal of Human-Computer Studies. Peer reviewed: Yes 2006 Hornbæk, K. (2006). "Current practice in measuring usability: Challenges to usability studies and research." International journal of human-computer studies 64(2): 79-102. Summary Reviews how usability is currently measured and identifies problems with those measurements. Discusses challenges of studies on usabliltiy and usability research. Motivations for studying how to measure usability #Definition of Usability is largely determined by how we measure it. Knowing how to measure it make the concept more manageable #"Usability cannot be directly measured." We measure aspects of usability, but the things we choose to measure may not be "valid indicators". Studying usability measures may lead to uncovering invalid measures #Benchmarking of programs raises issue of which measures to use in design/development of software Previous descriptors for SW quality : Ergonomics (Shackel, 1959) : Ease-of-use (Miller, 1971; Bennett, 1972) : Usability (Bennett, 1979; Shackel, 1981) Recent discussions "fueled" by perceived limitations of common measures : Users/designers/owners don't equally value measure of time (Dillon 2001) : Common measures don't account for "hedonic quality", which is intangibles "such as originality, innovativeness, beauty" (Hassenzahl et al 2000) : New contexts will likely require new measures to captue what is important in context (various) : -- Technology supporting learning (Soloway et al, 1994) : New measures continually arise, including fun, aesthetics, apparent usability, sociability, flow (various) :-- Flow (Hoffman and Novak, 1996) Method of Review : Review based on "broad conception of usability, similar to that of ISO 9241, part 11 (1998) and Bevan (1995. : Includes studies that do not mention "usabilty" : Only full-length, original research papers Three criteria for inclusion: : - Study reports quantified data on usability in Method or Results section : - Include studies that evaluate quality of interaction between users and interfaces. : - Studies describe relational comparisons (Rosenthal and Rosnow, 1991) between different cases Classified measures by Effectiveness, Efficiency, and Satisfaction : Each group subdivided based on prominent usability measures (Nielsen, 1993) (Dix et al, 1993) (Shneiderman, 1998), behavioral science (Meister, 1985), "well-known discussions of usability measures (Whiteside et al, 1988, Sweeney et al, 1993) Measures of Effectiveness *Binary task completion - whether or not users complete tasks *Accuracy - quantify number of errors **Spatial accuracy - used in studies of input devices **Precision - ratio of number of correct documents retrieved to total retrieved *Recall - how much info users call recall after having used the interface *Completeness - extent to which tasks are solved (orthogonal to accuracy) *Quality of Outcome - more complex than accuracy *Other Measures of Efficiency *Time - how long to complete task with the interface **Task completion time **Time in mode - time in help function, time in different parts of interface **Time until event - time to first key press, time to first relevant node *Input Rate - text entry speed, or throughput *Mental Effort *Usage Patterns - measures of how the interface is used **Use frequency - number of times a certain action is performed **Information accessed - number of objects visited in virtual space **Deviation from optimal solution - Ratio of actual distance covered to shortest distance *Communication Effort - resources users expend in communication *Learning - use changes in eficiency as indicator of learning *Other Measures of Satisfaction *Preference **Rank preferred interface **Rate preference for interface **Behavior in interaction *Satisfaction with the interface **Ease-of-use **Context-dependent questions **Before use **During use **Specific Attitudes *Users' Attitudes and Perceptions **Attitudes towards other persons **Attitudes towards content **Perception of Outcomes **Perception of Interaction *Other Measures of Specific Attitudes Towards the Interface *Annoyance *Anxiety *Complexity *Control *Engagement *Flexibility *Fun *Intuitive *Learnability *Liking *Physical Discomfort *Want to use again Subjective vs. Objective measures (Challenges in Measuring Usability) *This distinction widely used (Meister, 1985; Yeh and Wickens, 1988; Muckler and Seven, 1992) *May lead to different conclusions **Objective vs. Subjective time (Tractinsky and Meyer, 2001) (Czerwinski et al., 2001) (Eisler, 1976) **Objective vs. Subjective workload (Yeh and Wickens, 1988) **Objective vs. Subjective employer performance ( Bommer et al, 1995) *Using both may give more complete picture *We are interested not only in improving objective performance, but also in improving users’ experience of interaction. *May depend on intended context of use *May help researchers consider whether non-typical measures are relevant *Balanced focus may help improve both the user experience and objective performance *Pay special attention to whether subjective or objective measures are appropriate, or whether a mix is better *Do not conflate subjective and objective measures. Measures of Learnability and Retention' (Challenges in Measuring Usability)' *Relevant to compare measures of efficiency to recommendations on how to measure usability *At issue is the completeness of the measures used *Shneiderman (1998) recommends measuring **Time to learn **Speed of performance **Rate of user errors **Retention over time **Subjective satisfaction *Nielsen (1993) recommends similar measures *Completion time, accuracy, and satisfaction are common measures (92% of studies use at least 1, 13% use all 3) *Learnability only measured in 5 of the 180 studies *Retention only measured in 1 *Future studies could emphasize measures of learning, such as time needed to reach a certain level of proficiency *Rentention--ability of users to come back and successfully use the interface--is important in gaining more complete picture of usabilty *Need to develop easy-to-adopt techniques to measure learning and retention Usability over Time '(Challenges in Measuring Usability)' *Users typically only interact briefly with interfaces being tested *Median duration: 30 minutes *Only 13 studies had interactions of more than 5 hours **Long term study: McGrenere et al (2002) *We have little quantitative evidence about what long-term usable systems are like *We know little about usability development *Would be relevant to know how measures of effectiveness and satisfaction develop over time *Do results of "snapshot" studies remain relatively constant over time Extending, Validating, and Standardizing Measures of Satisfaction '(Challenges in Measuring Usability)' *Disarray of measures of satisfaction *93% of studies exclusively used Likert-like, post-use questionnaires *Problems with questionnaires: **Collected after the fact **shaped by (mis)understanding of the questions **General information which is hard to link to specific parts of interface *Some studies tried to extend satisfaction measures to avoid above problems **Nichols et al (2000) - included "startle" event to measure presence **Izsó and Láng (2000) - measured heart period variabliity to measure task complexity **Tattersall and Foord (1996) collected satisfaction ratings during use **Physiological measures of usability (Wastall, 1990) (Mullins and Treu, 1991) (Allanson and Wilson, 2002) *Need to validate and standardize satisfaction measures **Validated questionnaires available for some constructs: ***Anxiety (Bailey et al, 2001) ***Presence (Sällnas et all, 2001) ***Self-disclosure (Dahlbäck et al, 2001) ***Ease-of-use (Chin et al 1988) (Davis, 1989) **Some degree of validity achieved by building on measurement instruments from previous work ***Entertainment (O'Keefe et al, 2000) ***Subjective responses to computer-mediated conversation (Garau et al, 2001) **Ease-of-use measures reinvented again and again **Some existing questionnaires partly cover specific attitudes ***SUMI has subscales on control and learnability (Kirakowski and Corbett, 1993) ***QUIS assesses how easily users learn system (Chin et al, 1998) **Skepticism of using standardized questionnaires ***Not applicable in context-of-use ***Unnecessarily limiting scope of usability studies **Comparing studies with standardized questionnaires easier than comparing studies reviewed **Results from studies with standardized questionnaires allow greater confidence Studies of Correlations between Measures '(Challenges in Measuring Usability)' *Need to study correlation between usability measures *Needed within all ISO categories of usability aspects *Also need studies between aspects, i.e. what aspects of efficiency contribute to usability, but are not captured by effectiveness *Karat et al (2001) correlated mouse activity with satisfaction measures *MacKenzie et al (2001) correlated new proposals for usability measures via throughput *Frøkjær et al (2000) argues satisfaction not always correlated with effectiveness **What does this signify in a particular context? **Are we ignoring critical aspects of effectiveness? of satisfaction? **Are we looking at too short-term use? *In usability research there is a need for better understanding of relation between measures **Help investigate when objective and subjective measures uncover different aspects of usabillty **Important to justify relevance and necessity of new measures Micro and Macro Measures of Usability (Challenges in Measuring Usability) *Same measure of usability may be classified differently, depending on the level of task it is considered *Micro level **Tasks of short duration (seconds to minutes) **Manageable complelxity (most people get them right) **Often focus on perceptual or motor aspects **Time usually a critical resource *Macro level **Longer duration (hours, days, months) **Cognitively of socially complex **Display large individual differences in interaction process **Vast variations in outcome **Usually have effectiveness and satisfaction as critical parameters *Macro perspective is rare *Few studies use tasks that allow qualtity of outcome and learning to be measured *Dubious to believe we can safely decompose macro tasks into mico tasks and reason only about micro usability *Grand goals of user interfaces seem unlikely to be evaluated if we focus on micro measures **These goals seem to involve psychological and social complexities visible only in macro tasks **Stimulate creativity (Shneiderman, 2000) **Support sociability on internet (Preece, 2000) *Task completion times: **Short times crucial in studies of input devices (micro) **Long times indicative of motivation and engagement (macro) (Inkpen, 2001) Working Model '(Challenges in Measuring Usability)' *Model suggests questions and measures to consider *6 categories of particular importance **Objective measures ***Expert assessment, comprehension ***Time, usage patterns, learnability ***Physiological usability, reflex responses **Subjective measures ***Users' perception of outcome ***Subjectively experienced duration, mental workload, perception of task difficulty ***Validated questionnaires *In conducting usability studies, useful to check #If above measures are relevant #Whether measures from all 6 categories can be obtained and are useful #Whether questions below inspire more valid or complete measures *Research questions related to usability measures **Long-term use and development over time? **Macro or micro perspectives on tasks? **Relations between measures? **Valid and standardized measures? *Proposed model improvement over ISO 9241 part 11 usability standard **Quality or satisfaction with outcomes of interactions instead of effectiveness ***Include user perceptions of work products ***Include perceptions of whether they reached intended outcomes ***Outcomes includes tangible outcomes (work products) and intangible ones (changes in attitude, having fun, improving personal relations) **ISO standard suggests effectiveness measures may be combined to form measures of accuracy and completeness ***recommend reporting such measures separately **ISO definition of efficiency mixes effectiveness and efficiency measures ***suggest not to involve accuracy and completeness in efficiency measures ***suggest reporting efficiency measure per task or goal ***instead of ISO efficiency, makes more sense to use measures of interaction process ***differentiate between subjectively experiences and objectively measures aspects of interaction process **Instead of ISO satisfaction definition (freedom from discomfort and positive attitudes towards the use of the product), use measures of users' attitudes and experience **Keep distinct objective and subjective measures (don't mix/combine) **Insistence on using macro tasks will lead to bolder and more challenging measures Conclusion *Notable problems in how usability measures are employed #few studies use measures of quality of interaction #25% do not assess outcome of users' interaction, leaving claims of usabilty unsupported #measures of learning and retention are rarely employed #measures of interaction sometimes used as measures of quality of use, despite weak relation #measures of users' satisfaction with interfaces is in disarray #Some studies mix together or consider synonomous perceptions of phenomena with objective measures of those phenomena *Challenges with measuring usability **understand better relatioinship between objective and subjective measures **understand better how to measure learnability and retention **extend satisfaction measures beyond post-use questionnaires **study correlations between measures **push boundaries of concepts of usability measures by focusing on macro measures *Limitations of paper **Analyzed research studies, not problems and challenges of usability testing in software industry **Did not account in detail for how measures are used in different contexts or how different tasks and domains impact the choice of measures **Do not suggest usability can be fully accounted for outside of a particular context Future research *Develop subjective measures for areas currently measured mainly by objective measures, and vice versa, especially... **outcome quality vs. perceived outcome **time vs. subjective experienced duration **'perceived learnability vs. changes in time to complete tasks '(I really like this) **objective satisfaction vs. subjective satisfaction questionnaires Definitions Usability ‘‘the capability to be used by humans easily and effectively’’ (Shackel, 1991, p. 24) Usability: ‘‘quality in use’’ (Bevan, 1995) Usability: ‘‘the effectiveness, efficiency, and satisfaction with which specified users can achieve goals in particular environments’’ (ISO, 1998, p. 2). Effectiveness: ‘‘accuracy and completeness with which users achieve specified goals’’ (ISO, 1998, p. 2). Efficiency: ‘‘resources expended in relation to the accuracy and completeness with which users achieve goals’’ (ISO, 1998, p. 2). Satisfaction: ‘‘freedom from discomfort, and positive attitudes towards the user of the product’’ (ISO, 1998, p. 2). Usability: ... context dependent (Newman and Taylor, 1999) Usability: shaped by interaction between tools, problems, and people (Naur, 1965) Related Ideas Guidelines for impoving usability of systems (Smith and Mosier, 1986) Methods for predicting usability problems (Molich and Nielsen, 1990) (Wharton et al, 1994) Techniques to test usability of systems (Lewis, 1982) How to measure usability (Nielsen and Levy, 1994) (ISO, 1998) (Frokjaer et al, 2000) Papers from CHI conference and Interact "serve as exemplary cases of research in HCI" Effectiveness of computer-based tutors (Corbett and Anderson, 2001) Abstract How to measure usability is an important question in HCI research and user interface evaluation. We review current practice in measuring usability by categorizing and discussing usability measures from 180 studies published in core HCI journals and proceedings. The discussion distinguish several problems with the measures, including whether they actually measure usability, if they cover usability broadly, how they are reasoned about, and if they meet recommendations on how to measure usability. In many studies, the choice of and reasoning about usability measures fall short of a valid and reliable account of usability as quality-in-use of the user interface being studied. Based on the review, we discuss challenges for studies of usability and for research into how to measure usability. The challenges are to distinguish and empirically compare subjective and objective measures of usability; to focus on developing and employing measures of learning and retention; to study long-term use and usability; to extend measures of satisfaction beyond post-use questionnaires; to validate and standardize the host of subjective satisfaction questionnaires used; to study correlations between usability measures as a means for validation; and to use both micro and macro tasks and corresponding measures of usability. In conclusion, we argue that increased attention to the problems identified and challenges discussed may strengthen studies of usability and usability research. © 2005 Elsevier Ltd. All rights reserved.